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O Abstract 
(N 

A major enterprise in compressed sensing and sparse approximation is the design and analysis 
Qh of computationally tractable algorithms for recovering sparse, exact or approximate, solutions of 

underdetermined linear systems of equations. Many such algorithms have now been proven to 

— have optimal-order uniform recovery guarantees using the ubiquitous Restricted Isometry Property 

(RIP) |10| . However, it is unclear when the RIP-based sufficient conditions on the algorithm are 

i | satisfied. We present a framework in which this task can be achieved; translating these conditions for 

Gaussian measurement matrices into requirements on the signal's sparsity level, length, and number 
of measurements. We illustrate this approach on three of the state-of-the-art greedy algorithms: 
CoSaMP [29], Subspace Pursuit (SP) [T2] and Iterative Hard Thresholding (IHT) [7J. Designed 
to allow a direct comparison of existing theory, our framework implies that, according to the best 

t— I known bounds, IHT requires the fewest number of compressed sensing measurements and has the 

lowest per iteration computational cost of the three algorithms compared here. 

Keywords: Compressed sensing, greedy algorithms, sparse solutions to underdetermined systems, 
restricted isometry property, phase transitions, Gaussian matrices. 

o 
o 

t— I 1. Introduction 

In compressed sensing [Q].I10^ fTT] . one works under the sparse approximation assumption, namely, 
that signals/vectors of interest can be well approximated by few components of a known basis. This 
assumption is often satisfied due to constraints imposed by the system which generates the signal. 
In this setting, it has been proven (originally in [10\ [T7] and by many others since) that the number 
of linear observations of the signal, required to guarantee recovery, need only be proportional to the 
sparsity of the signal's approximation. This is in stark contrast to the standard Shannon-Nyquist 
Sampling paradigm [HH] where worst-case sampling requirements are imposed. 

In the simplest setting, consider measuring a vector xq G 1^ which either has exactly k < N 
nonzero entries, or which has k entries whose magnitudes are dominant. Let Abe an nx N matrix 
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with n < N which we use to measure xq; the n inner products with xq are the entries in y = Axq. 
From knowledge of y and A one seeks to recover the vector xq , or a suitable approximation thereof, 
[8]. Let x N (k) := { x £ ^™ : \\ x \\q — k} denote the family of at most fc-sparse vectors in M. N , where 
||-|| counts the number of nonzero entries. From y and A, the optimal fc-sparse signal is the solution 



where || • H2 denotes the Euclidean norm. 



However, solving via a naive exhaustive search is combinatorial in nature and NP-hard 
|28j . A major aspect of compressed sensing theory is the study of alternative methods to solving 
0. Since the system y = Ax is underdetermined, any successful recovery of x will require some 
form of nonlinear reconstruction. Under certain conditions, various algorithms have been shown 
to successfully reduce to a tractable problem, one with a computational cost which is a low 
degree polynomial of the problem dimensions, rather than the exponential cost associated with 
a direct combinatorial search for the solution of 0. While there are numerous reconstruction 
algorithms, they each generally fall into one of three categories: greedy methods, regularizations, or 
combinatorial group testing. For an indepth discussion of compressed sensing recovery algorithms, 
see |29j and references therein. 

The first uniform guarantees for exact reconstruction of every x 6 X (k), for a fixed A, came 
from ^i-regularization. In this case, ([T]) is relaxed to solving the problem 



for some known noise level, or decreasing, 7. ^i-regularization has been extensively studied, see 
the pioneering works [TOj, [IT]; also, see [TBI [211 E] for results analogous to those presented here. In 
this paper, we focus on three illustrative greedy algorithms, Compressed Sensing Matching Pursuit 
(CoSaMP) [29], Subspace Pursuit (SP) |12| . and Iterative Hard Thresholding (IHT) [7J, which boast 
similar uniform guarantees of successful recovery of sparse signals when the measurement matrix A 
satisfies the now ubiquitous Restricted Isometry Property (RIP) [10\ 16], The three algorithms are 
deeply connected and each have some advantage over the other. These algorithms are essentially 
support set recovery algorithms which use hard thresholding to iteratively update the approximate 
support set; their differences lie in the magnitude of the application of hard thresholding and the 
vectors to which the thresholding is applied, [141 137j . The algorithms are restated in the next section. 
Other greedy methods with similar guarantees are available, see for example |11[ I27j; several other 
greedy techniques have been developed ([231 EQl [13], etc.), but their theoretical analyses do not 
currently subscribe to the above uniform framework. 

As briefly mentioned earlier, the intriguing aspect of compressed sensing is its ability to recover 
/c-sparse signals when the number of measurements required is proportional to the sparsity, n ~ k, 
as the problem size grows, n — > 00. Each of the algorithms discussed here exhibit a phase transition 
property, where there exists a fe* such that for any e > 0, as fc*, n — > 00, the algorithm successfully 
recovers all /c-sparse vectors provided k < (1 — e)&* and does not recover all fe-sparse vectors if 
k > (1 + e)/c*. For a description of phase transitions in the context of compressed sensing, see [15], 
while for numerical average-case phase transitions for greedy algorithms, see [14J. We consider the 
asymptotic setting where k and N grow proportionally with n, namely, (k, n, N) — > 00 with the 
ratios ~ — > p, — > 5 as n — > 00 for (5, p) G (0, l) 2 ; also, we assume the matrix A is drawn i.i.d. from 



of 



min II Ax — y 







2 



AA(0,n _1 ), the normal distribution with mean and variance re -1 . In this framework, we develop 
lower bounds on the phase transition for exact recovery of all /c-sparse signals. These bounds provide 
curves in the unit square, (5,p) £ (0,1) 2 , below which there is an exponentially high probability 
on the draw the Gaussian matrix A, that A will satisfy the sufficient RIP conditions and therefore 
solve ([TJ). We utilize a more general, asymmetric version of the RIP, see Definition [TJ to compute as 
precise a lower bound on the phase transitions as possible. This phase transition framework allows 
a direct comparison of the provable recovery regions of different algorithms in terms of the problem 
instance (77 > — ) • We then compare the guaranteed recovery capabilities of these algorithms to the 
guarantees of £i-regularization proven via RIP analysis. For ^i-regularization, this phase transition 
framework has already been applied using the RIP O [6] , using the theory of convex polytopes [T6] 
and geometric functional analysis [35] . 

The aforementioned lower bounds on the algorithmic exact sparse recovery phase transitions 



are presented in Theorems 10, 11, and 12 The curves are defined by functions p S g(S) (SP; the 
magenta curve in Fig (lfa)), Ps^d) (CoSaMP; the black curve m FigQa)), pf\S) (IHT; the red 
curve m Figga)). For comparison, the analogous lower bound on the phase transition for Pg(6) 
(£i-regularization) is displayed as the blue curve in FigJIJa). From Fig. [TJ we are able to directly 
compare the provable recovery results of the three greedy algorithms as well as £i-regularization. 
For a given problem instance (k,n,N) with the entries of A drawn i.i.d. from A/"(0,n _1 ), if ^ = p 
falls in the region below the curve p^ 9 (S) associated to a specific algorithm, then with probability 
approaching 1 exponentially in n, the algorithm will exactly recover the /c-sparse vector x £ x (&) 
no matter which x £ x (&) was measured by A. These lower bounds on the phase transition can 
also be interpreted as the minimum number of measurements known to guarantee recovery through 

the constant of proportionality: n > (^p'g 9 ^ k. Fig. jl|b) portrays the inverse of the lower bounds 

on the phase transition. This gives a minimum possible value for (^p^ 9 ^ ■ For example, from 
the blue curve, for a Gaussian random matrix used in £i-regularization, the minimum number 
of measurements proven (using RIP) to be sufficient to ensure recovery of all £;-sparse vectors is 
n > 317k. By contrast, for greedy algorithms, the minimum number of measurements shown to be 
sufficient is significantly larger: n > 907/c for IHT, n > 3124/c for SP, and n > 4923k for CoSaMP. 

More precisely, the main contributions of this article is the derivation of theorems and corollaries 
of the following form for each of the CoSaMP, SP, and IHT algorithms. 

Theorem 1. Given a matrix A with entries drawn i.i.d. from AA(0, re -1 ), for any x £ X (k)> 
let y = Ax + e for some (unknown) noise vector e. For any e £ (0,1), as (k,n,N) — > 00 with 
n/N — > 5 £ (0, 1) and k/n — > p £ (0, 1), there exists p al9 (5,p) and p a g a {5), the unique solution to 
p al9 (5,p) = 1. If p < (1 — e)p a g 9 {5), there is an exponentially high probability on the draw of A that 
the output of the algorithm at the I th iteration, x, approximates x within the bound 



x-x\\ 2 <K al 9(6,(l + e)p) p M9 (5,(l + e)p) [|x|| 2 + - s £^f- ^ J e|| 2 , (3) 



1. .. £ al '>(8,(l + e)p) 



1 - p al 9(5, (1 + e)p) 



for some K alg (5,p) and £ al9 (8,p). 



Corollary 2. Given a matrix A with entries drawn i.i.d. from M(0,n 1 ), for any x £ X N (k)> ^ 
y = Ax. For any e £ (0, 1), with n/N — > S € (0, 1) and k/n — > p < (1 — ^)p^ 9 {S) as (k, n, N) — > 00, 
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Figure 1: (a): The lower bounds on the Str ong exact recovery phase tran sitio n for Gaussian random matr ices for the 
algorithms i"i-regularization (Theorem 
magenta), and CoSaMP (Theorem 
left panel (a). 
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_L3J (8), blue), IHT (Theorem ^ Ps ht (S), red), SP (Theorem [TTJ p s s p {5), 
p s p (5), black), (b): The inverse of the phase transition lower bounds in the 



there is an exponentially high probability on the draw of A that the algorithm exactly recovers x 
from y and A in a finite number of iterations not to exceed 



palg ( 



log u min (x) - log K al9 {6, p) 



log fi al 9(5, p) 



+ 1 



where 



miiijgr \xj\ 



(4) 



(5) 



with T := {i : Xi ^ 0} and \m~\ , the smallest integer greater than or equal to m. 



The factors p al9 (5, p) and x f^l g {5, p) for CoSaMP, SP, and IHT are displayed in Figure J^J while 
formulae for their calculation are deferred to Section [3j 

Corollary [2] implies that p a g 9 {5) delineates the region in which the algorithm can be guaranteed 
to converge provided there exists an x £ X (k) such that y = Ax. However, if no such x exists, as p 
approaches p^ 9 (S) the guarantees on the number of iteraties required and stability factors become 
unbounded. Further bounds on the convergence factor p al9 (5, p) and the stability factor ^ a alg (5, p) 

result in yet lower curves p^ 9 (5; bound) for a specified bound; recall that /5g 9 (<5) corresponds to the 
bound p al9 (5,p) = 1. 

In the next section, we recall the three algorithms and introduce necessary notation. Then 
we present the asymmetric restricted isometry property and formulate weaker restricted isometry 
conditions on a matrix A that ensure the respective algorithm will successfully recover all fc-sparse 
signals. In addition to exact recovery, we study the known bounds on the behavior of the algorithms 
in the presence of noisy measurements. In order to make quantitative comparisons of these results, 
we must select a matrix ensemble for analysis. In Section [3j we present the lower bounds on the 
phase transition for each algorithm when the measurement matrix is a Gaussian random matrix. 
Phase transitions are developed in the case of exact sparse signals while bounds on the multiplicative 
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stability constants are also compared through associated level curves. Section [4] is a discussion of 
our interpretation of these results and how to use this phase transition framework for comparison 
of other algorithms. 

For an index set / C {1, . . . , N}, let xj denote the restriction of a vector x 6 M. N to the set /, 

1. e., (xi)i = Xi for i E I and [xi)j = for j £ I. Also, let Aj denote the submatrix of A obtained by 
selecting the columns A indexed by /. A*j is the conjugate transpose of Aj while A\ = (A* I Aj)~ 1 A* I 
is the pseudoinverse of Aj. In each of the algorithms, thresholding is applied by selecting m entries 
of a vector with largest magnitude; we refer to this as hard thresholding of magnitude m. 

2. Greedy Algorithms and the Asymmetric Restricted Isometry Property 

2.1. CoSaMP 

The CoSaMP recovery algorithm is a support recovery algorithm which applies hard thresh- 
olding by selecting the k largest entries of a vector obtained by applying a pseudoinverse to the 
measurement y. In CoSaMP, the columns of A selected for the pseudoinverse are obtained by 
applying hard thresholding of magnitude 2k to A* applied to the residual from the previous iter- 
ation and adding these indices to the approximate support set from the previous iteration. This 
larger pseudoinverse matrix of size 2k x n imposes the most stringent aRIP condition of the three 
algorithms. However, CoSaMP uses one fewer pseudoinverse per iteration than SP as the residual 
vector is computed with a direct matrix- vector multiply of size n x k rather than with an additional 
pseudoinverse. Furthermore, when computing the output vector x, CoSaMP does not need to apply 
another pseudoinverse as does SP. See Algorithm [TJ 

Algorithm 1 CoSaMP [29\ 
Input: A, y, k 

Output: A /c-sparse approximation x of the target signal x 

Initialization: 

1: Set T° = 
2: Set y° = y 

Iteration: During iteration /, do 
1: f l = T l ~ l U {2k indices of largest magnitude entries of A*y l x } 
2: x = A j L l y 

3: T l = {k indices of largest magnitude entries of x} 
4: y l = y- A T ix T i 
5: if ||?/||2 = then 

6: return x defined by Xr l iN y_ T i = and x T i = x T i 
7: else 

8: Perform iteration / + 1 
9: end if 



2.2. Subspace Pursuit 

The Subspace Pursuit algorithm is also a support recovery algorithm which applies hard thresh- 
olding of magnitude k to a vector obtained by applying a pseudoinverse to the measurements y. 
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The submatrix chosen for the pseudo-inverse has its columns selected by applying A* to the resid- 
ual vector from the previous iteration, hard thresholding of magnitude k, and adding the indices 
of the terms to the previous approximate support set. Compared to the other two algorithms, a 
computational disadvantage of SP is that the aforementioned residual vector is also computed via a 
pseudoinverse, this time selecting the columns from A by again applying a hard threshold of mag- 
nitude k. The computation of the approximation to the target signal also requires the application 
of a pseudoinverse for a matrix of size n x k. See Algorithm [2] 

Algorithm 2 Subspace Pursuit [12J 
Input: A, y, k 

Output: A fc-sparse approximation x of the target signal x 

Initialization: 

1: Set T° = {k indices of largest magnitude entries of A*y} 
2: Set y° = y - A T oA\, y 

Iteration: During iteration /, do 
1: T = T 1 ^ 1 U {k indices of largest magnitude entries of A*y l r ~ 1 } 
2- Set x = A ] fl y 

3: T l = {k indices of largest magnitude entries of x} 
4: y\ = y-A T iA ] Tl y 
5: if \\y\,\\2 = then 

6: return x defined by i/i = and x T i = A^y 

7: else 

8: Perform iteration I + 1 
9: end if 



2.3. Iterative Hard Thresholding 

Iterative Hard Thresholding (IHT) is also a support recovery algorithm. However, IHT applies 
hard thresholding to an approximation of the target signal, rather than to the residuals. This 
completely eliminates the use of a pseudoinverse, reducing the computational cost per iteration. 
In particular, hard thresholding of magnitude k is applied to an updated approximation of the 
target signal, x, obtained by matrix-vector multiplies of size nx N that represent a move by a fixed 
stepsize u along the steepest descent direction from the current iterate for the residual \\Ax — y|||- 
See Algorthm[3j 

Remark 1. ('Stopping criteria for greedy methods,) In the case of corrupted measurements, 
where y = Ax + e for some noise vector e, the stopping criteria listed in Algorithms [IJ[3] may 
never be achieved. Therefore, a suitable alternative stopping criteria must be employed. For our 
analysis on bounding the error of approximation in the noisy case, we bound the approximation 
error if the algorithm terminates after I iterations. For example, we could change the algorithm 
to require a maximum number of iterations I as an input and then terminate the algorithm if our 
stopping criteria is not met in fewer iterations. In practice, the user would be better served to 
stop the algorithm when the residual is no longer improving. For a more thorough discussion of 
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Algorithm 3 Iterative Hard Thresholding [7] 
Input: A, y, to G (0, 1), k 

Output: A fc-sparse approximation x of the target signal x 

Initialization: 

1: Set x° = 
2: Set T° = 
3: Set y° = y 

Iteration: During iteration /, do 
1: x l = + wA*y l ~ 1 

2: T l = {k indices of largest magnitude entries of x 1 } 
3: y l = y- A T ix l T , 
4: if \\y l \\2 = then 

5: return x defined by £{i jv}- T' = an d = x yi 
6: else 

7: Perform iteration / + 1 
8: end if 



suitable stopping criteria for each algorithm in the noisy case, see the original announcement of the 
algorithms Q El WSj. 

2-4- The Asymmetric Restricted Isometry Property 

In this section we relax the sufficient conditions originally placed on Algorithms T][3 by employing 



a more general notion of a restricted isometry. As discussed in [6], the singular values of the 
n x k submatrices of an arbitrary measurement matrix A do not, in general, deviate from unity 
symmetrically. The standard notion of the restricted isometry property (RIP) |10] has an inherent 
symmetry which is unneccessarily restrictive. Hence, seeking the best possible conditions for the 
measurement matrix under which Algorithms [T]{3] will provably recovery every k sparse vector, we 
reformulate the sufficient conditions in terms of the asymmetric restricted isometry property (aRIP) 
®. 

Definition 1. For an n x N matrix A, the asymmetric RIP constants L(k,n,N) and U(k,n, N) 
are defined as: 

L(k, n, N) := min c subject to (1 — c)||a;||| < ||^4a;|||, Vx 6 X (&); (6) 

c>0 

U(k, n, N) := min c subject to (1 + c)\\x\\l > \\Ax\\l , Mxex N {k)- (7) 

c>0 

Remark 2. 1. The more common, symmetric definition of the RIP constants is recovered by 
defining R{k,n,N) = m&x{L(k,n, N),U(k,n, N)} . In this case, a matrix A of size nx N 
has the RIP constant R(k, n, N) if 

R(k,n,N) := min c subject to (1 - c)\\x\\l < \\Ax\\l < (1 + c)||a;||| Vx G x N (k)- 

c>0 
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2. Observe that x N (k) C x N (k + 1) for any k and therefore the constants L(k, n, N), U(k, n, N), 
and R(k, n, N) are nondecreasing in k 

3. For all expressions involving L(-,n,N) it is understood, without explicit statement, that the 
first argument is limited to the range where L(-,n,N) < 1. Beyond this range of sparsity, 
there exist vectors which are mapped to zero, and are unrecoverable. 

Using the aRIP, we analyze the three algorithms in the case of a general measurement matrix 
A of size n x N. For each algorithm, the application of Definition [T] results in a relaxation of 
the conditions imposed on A to provably guarantee recovery of all x £ X We first present 

a stability result for each algorithm in terms of bounding the approximation error of the output 
after I iterations. The bounds show a multiplicative stability constant in terms of aRIP contants 
that amplifies the total energy of the noise. As a corollary, we obtain a sufficient condition on A in 
terms of the aRIP for exact recovery of all /c-sparse vectors. The proofs of these results are found 
in the Appendix. These theorems and corollaries take the same form, differing for each algorithm 
only by the formulae for various factors. We state the general form of the theorems and corollaries, 
analogous to Theorem [T] and Corollary [2j and then state the formulae for each of the algorithms 
CoSaMP, SP, and IHT. 

Theorem 3. Given a matrix A of size n x N with aRIP constants L(-,n,N) and U(-,n,N), for 
any x £ X N (k), let y = Ax + e, for some (unknown) noise vector e. Then there exists [i al9 (k, n, N) 
such that if /j, al-9 (k,n, N) < 1, the output x of algorithm "alg" at the I th iteration approximates x 
within the bound 

\\x-x\\ 2 < K al9 (k, n, N) [^{k, n, N)] ' ||x|| 2 + ^^^ Mh (8) 
for some K al9 (k, n, N) and ^ al9 (k, n, N). 

Corollary 4. Given a matrix A of size nx N with aRIP constants L(-,n,N) and U(-,n, N), for 
any x € X (A;), let y = Ax. Then there exists /j, al9 (k,n, N) such that if fi al9 (k,n, N) < 1, the 
algorithm "alg" exactly recovers x from y and A in a finite number of iterations not to exceed 



nalg 

tin n i 



where v m i n {x) defined as in 



log Vmin(x) - log K al9 (k, n, N) 

log fi al9 (k,n,N) 



+ 1 (9) 



We begin with Algorithm [TJ the Compressive Sampling Matching Pursuit recovery algorithm 
of Needell and Tropp [20] . We relax the sufficient recovery condition in |29| via the aRIP. 

Theorem 5 (CoSaMP). Theorem^and Corollary^ are satisfied by CoSaMP, Algorithm^ with 
K cs P(k,n,N) := 1 and /j cs P(k,n,N) and £ cs P(k,n,N) defined as 

_ 1 / L(4k,n,N) + U(4k,n,N) \ ( L(2k, n, N) + U(2k, n, N) + L(4fc, n, N) + U(ik, n, N) ' 
/i (fc,n,7Vj:- 2 ^2+ l _ L{ ^ k ^ N) ){ 1 — L(2k, n, N) 

(10) 
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and 



, L(4fc, n, AQ + U(4k, n,N) \( y/l + U(2k, n, N) \ 1 \ 

" _U l-L(3fc,n,JV) J ^ l-L(2fc,n,JV) J + ^1 - L(3k, n, N) J ' 

(11) 

Next, we apply the aRIP to Algorithm [2j Dai and Milenkovic's Subspace Pursuit [12J . Again, 
the aRIP provides a sufficient condition that admits a wider range of measurement matrices than 
admitted by the symmetric RIP condition derived in [12J. 

Theorem 6 (SP). Theorem^ and Corollary^ are satisfied by Subspace Pursuit, Algorithm^ 
with K sp (k,n, N), fj, sp (k,n, N), and £ sp (k,n, N) defined as 

and 

2U(3k, n, N) 



1 - n sp (k, n, N) + 2n sp {k, n, N) 1 + 



1 - L(2k,n,N) 



2K° p (k,n,N) 
y/l-L(2k,n,N)' 

Finally, we apply the aRIP analysis to Algorithm [3j Iterative Hard Thresholding for Compressed 
Sensing introduced by Blumensath and Davies [7]. Theorem [7] employs the aRIP to provide a weaker 
sufficient condition than derived in [7j. 

Theorem 7 (IHT). Theorem^ and Corollary^ are satisfied by Iterative Hard Thresholding, Al- 
gorithm^ with K tht (k,n, N) := 1 and fj, M (k,n, N) and ^ tht (k, n, N) defined as 

H iht (k, n, N) := 2\f2 max {a; [1 + U(3k, n, N)] - 1, 1 - u [1 - L(3k, n, N)]} . (15) 

and 

C ht (k, n, N) := 2u)^Jl + U{2k,n,N). (16) 

Remark 3. Each of Theorems^ [6] and[?| are derived following the same recipe as in 129$ . ]12$ 
and JW ; respectively, using the aRIP rather than the RIP and taking care to maintain the least 
restrictive bounds at each step (for details, see the Appendix). For Gaussian matrices, the aRIP 
improves the lower bound on the phase transitions by nearly a multiple of 2 when compared to 
similar statements using the classical RIP. For IHT, the aRIP is simply a scaling of the matrix so 
that its RIP bounds are minimal. This is possible for IHT as the factors in fi tht (k,n, N) involve 
L(ak,n, N) and U(ak,n, N) for only one value of a, here a = 3. No such scaling interpretation is 
possible for CoSaMP and SP. 
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At this point, we digress to mention that the first greedy algorithm shown to have guaranteed 
exact recovery capability is Needell and Vershynin's ROMP (Regularized Orthogonal Matching 
Pursuit) |30j. We omit the algorithm and a rigorous discussion of the result, but state an aRIP 
condition that will guarantee sparse recovery. ROMP chooses additions to the approximate support 
sets at each iteration with a regularization step requiring comparability between the added terms. 
This comparability requires a proof of partitioning a vector of length N into subsets with comparable 
coordinates, namely the magnitudes of the elements of the subset differ by no more than a factor 
of 2. The proof that such a partition exists, with each partition having a nonzero energy, forces a 
pessimistic bound that decays with the problem size. 

Theorem 8 (Regularized Orthogonal Matching Pursuit). Let A be a matrix of size rax N 
with aRIP constants L(2k,n,N) and U(2k,n,N). Define 



If p r (k,n, N) < + w ^3[(logn + 2) J ; then ROMP is guaranteed to exactly recover any x 6 
X N (k) from the measurements y = Ax in a finite number of iterations. 

Unfortunately, this dependence of the bound on the size of the problem instance forces the result 
to be inadequate for large problem instances. In fact, this result is inferior to the results for the 
three algorithms stated above which are all independent of problem size and therefore applicable 
to the most interesting cases of compressed sensing, when (k,n,N) — > oo and 5 = n/N — > 0. It is 
possible that this dependence on the problem size is an artifact of the technique of proof; without 
removing this dependence, large problem instances will require the measurement matrix to be a 
true isometry and the phase transition framework of the next section does not apply. 

3. Phase Transitions for Greedy Algorithms with Gaussian Matrices 

The quantities p alg (k,n, N) and £ al9 (k,n, N) in Theorems [5J [6j and [7] dictate the current the- 
oretical convergence bounds for CoSaMP, SP, and IHT. Although some comparisons can be made 
between the forms of \x alg and £ a ' 9 for different algorithms, it is not possible to quantitatively state 
for what range of k the algorithm will satisfy bounds on p al9 (k, n, N) and £ al9 (k, n, N) for a specific 
value of n and N. To establish quantitative interpretations of the conditions in Theorems [5j [6] and 
[7J it is necessary to have quantitative bounds on the behaviour of the aRIP constants L(k,n,N) 
and U(k,n, N) for the matrix A in question, [H [6]. Currently, there is no known matrix A for 
which it has been proven that U(k, n, N) and L(k, n, N) remain bounded above and away from 
one, respectively, as n grows, for k and N proportional to n. However, it is known that for some 
random matrix ensembles, with exponentially high probability on the draw of A, 1 _ L (l n N ^ and 
U(k,n, N) do remain bounded as ra grows, for k and N proportional to n. The ensemble with the 
best known bounds on the growth rates of L(k, n, N) and U(k, n, N) in this setting is the Gaussian 
ensemble. In this section, we consider large problem sizes as (k,n,N) — > 00, with 4£ — > 8 and 
^ — > p for 5, p G (0, 1). We study the implications of the sufficient conditions from Section [2] for 
matrices with Gaussian i.i.d. entries, namely, entries drawn i.i.d. from the normal distribution with 
mean and variance n _1 , A/"(0,n _1 ). 



p r (k,n,N) := U(2k,n,N) 1 + 



1 + U(2k, ra, N) 
1 - L(2k, n, N) 



) 



(17) 
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Figure 3: Bounds, L(S, p) and U(S,p) (left and right respectively), above which it is exponentially unlikely that the 
RIP constants L(k,n,N) and U(k,n, N) exceed, with entries in A drawn i.i.d. iV^n -1 ) and in the limit as ^ — > p 
and ft — > S as n — > oo, see Theorem [9] 



Gaussian random matrices are well studied and much is known about the behavior of their 
eigenvalues. Edelman [23j derived bounds on the probability distribution functions of the largest 
and smallest eigenvalues of the Wishart matrices derived from a matrix A with Gaussian i.i.d. 
entries. Select a subset of columns indexed by / C {1, . . . , N} with cardinality k and form the 
submatrix Aj, and the associated Wishart matrix derived from Aj is the matrix A* T Ai. The 
distribution of the most extreme eigenvalues of all ( , ) Wishart matrices derived from A with 
Gaussian i.i.d. entries is only of recent interest and the exact probability distribution functions 
are not known. Recently, using Edelman's bounds [23j, the first three authors [6] derived upper 
bounds on the probability distribution functions for the most extreme eigenvalues of all (^) Wishart 
matrices derived from A. These bounds enabled them to formulate upper bounds on the aRIP 
constants, L(k,n,N) and U(k,n, N), for matrices of size nx N with Gaussian i.i.d. entries. 

Theorem 9 (Blanchard, Cartis, and Tanner |6j). Let A be a matrix of size nx N whose en- 
tries are drawn i.i.d. from Af(0, n _1 ) and let n — > oo with - — > p and — > S. Let H(p) := 
plog(l/p) + (1—p) log(l/(l —p)) denote the usual Shannon Entropy with base e logarithms, and let 

Vw(A ;/ o) := H(p) + -[(l-p)\ogX + l-p + plogp-X\, (18) 
Vw(A,p) := -[(l + p)logA + l + p-plogp-A]. (19) 



Define \ m in(o~,p) and Xmax{o~,p) as the solution to (20) and (21), respectively: 

0~lpmin(Xmin(o~, p),p) + H(p5) = for \ m in(o~, P) < 1 ~ P (20) 
5lpmax(^max(S,p),p) + H(pd) = for X max (S, p) > 1 + p. (21) 

Define L(6,p) andU(5,p) as 

L(6, p) := 1 - \ m in{6, p) and U(S,p):= min X max (S, v) - 1. (22) 

ve\p,i\ 



12 



For any e > 0, as n — )• oo, 

Prob(L(k,n,N) < L(5,p) + e) -)• 1 and Pro& (f/(fc, n, JV) < U (S, p) + e) 1. 



The details of the proof of Theorem [9] are found in [6] . The bounds are derived using a simple 
union bound over all (^) of the k x k Wishart matrices AJAj that can be formed from columns 
of A. Bounds on the tail behavior of the probability distribution function for the largest and 
smallest eigenvalues of AJAj can be expressed in the form p(n, A) exp(n?/>(A, p)) with ip defined in 



( 18 ) and ( 19 ) and p(n, A) a polynomial. Following standard practices in large deviation analysis, 



the tails of the probability distribution functionals are balanced against the exponentially large 



number of Wishart matrices (|20j) and (21) to define upper and lower bounds on the largest and 
smallest eigenvalues of all ( fe ) Wishart matrices, with bounds A maa; (<5, p) and X m i n (S, p), respectively. 
Overestimation of the union bound over the combinatorial number of (*P\ Wishart matrices causes 
the bound A max (<5, p) to not be strictly increasing in p for S large; to utilize the best available bound 
on the extreme of the largest eigenvalue, we note that any bound A maa; (<5, v) for v E [p, 1] is also a 
valid bound for submatrices of size n x k. The asymptotic bounds of the aRIP constants, L(5, p) 
and U(5,p), follow directly. See Figure [3] for level curves of the bounds. 

With Theorem [9j we are able to formulate quantitative statements about the matrices A with 
Gaussian i.i.d. entries which satisfy the sufficient aRIP conditions from Section [2] A naive re- 
placement of each L(-,n,N) and U(-,n,N) in Theorems [5]{7] with the asymptotic aRIP bounds in 
Theorem [9] is valid in these cases. The properties necessary for this replacement are detailed in 
Lemma [To] stated in the Appendix. For each algorithm (CoSaMP, SP and IHT) the recovery con- 
ditions can be stated in the same format as Theorem [T] and Corollary [2j with only the expressions 



for k(5,p), p(5, p) and £(<5, p) differing. These recovery factors are stated in Theorems 10 12 



Theorem 10. Theorem^ and Corollary^are satisfied for CoSaMP, Algorithm^ with K csp (5,p) := 
1 and fi csp (5, p) and £ csp (5, p) defined as 

M «p (( y s := 1 ( 2+ L{5AP) + U{5AP) \ ( LjS, 2p) + U(5, 2p) + LjS, 4p) + U(5, 4g) \ _ 



and 

P CS P(S n\ ■= 9 J f 9 4- ^ , , , , 

l-L(S,3p)) J\l-L(5,2p) J y/l-L(5,3p)f 

The phase transition lower bound p C g P (6) is defined as the solution to p csp (5,p) = 1. p C g P (5) is 
displayed as the black curve in Figure Qa) . p csp (S,p) and £ csp (5, p)/(l — p csp (5,p)) are displayed 
in Figure [2] panels (a) and (b) respectively. 

Theorem 11. Theorem^ and Corollary^ are satisfied for Subspace Pursuit, Algorithm^ with 
K sp (5,p), p sp (5,p), and £ sp (5, p) defined as 

2U(5,3p) ( 2U(5,3p) \ ( U(5,2p) \ 
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and 



1-L(6,p) 
Vl-L(5,2p) 



l-^,p) + 2^,p)(l+ 2C/(5 ' 3P) 



1-L(ff,2p) 



+ 2 ^ P) . (27) 



The phase transition lower bound p S g(5) is defined as the solution to p sp (5,p) = 1. p S g(S) is 
displayed as the magenta curve in Figure [TJja) . p sp (5,p) and £ sp (<5, /o)/(l — p sp (5,p)) are displayed 
in Figure [2] panels (c) and (d) respectively. 

Theorem 12. Theorem^and Corollary^are satisfied for Iterative Hard Thresholding, Algorithm 
[| with u := 2/(2 + U(S,3p) - L(5,3p)), K iht (5,p) := 1, and p iht (5,p) and e ht (5,p) defined as 



anc 



..ihur, , rJ m3 P ) + U(5,3p) \ 
" V'ti ■= 2V2 {2 + U(5,3p)-L(5,3p)) 



(28) 



™^ 2 + t 7( W -X(W (29) 

The phase transition lower bound P5*(#) is defined as the solution to p tht (6,p) = 1. Pg(5) is 
displayed as the red curve in Figure [T^a) . p lht (5,p) and £ lht (5, p)/(l — p lht (S,p)) are displayed in 
Figure [2] panels (e) and (f) respectively. 

An analysis similar to that presented here for the greedy algorithms CoSaMP, SP, and IHT was 
previously carried out in [H] for the £i-regularization problem Q. The form of the results differs 
from those of Theorem [T] and Corollary [2] in that no algorithm was specified for how ([2]) is solved. 
For this reason, no results are stated for the convergence rate or number of iterations. However, ([2]) 
can be reformulated as a convex quadratic or second-order cone programming problem — and its 
noiseless variant as a linear programming — which have polynomial complexity when solved using 
interior point methods [33]. Moreover, convergence and complexity of other alternative algorithms 
for solving ([2]) such as gradient projection have long been studied by the optimization community 
for more general problems (3j I3T1 134"] . and recently, more specifically for Q [251 [32] and many 
more. For completeness, we include the recovery conditions for £i-regularization derived in [6]; 
these results follow from the original ^i-regularization bound derived by Foucart and Lai [26] for 
general A. 

Theorem 13 (Blanchard, Cartis, and Tanner [6]). Given a matrix A with entries drawn i.i.d. 
from A/"(0, n^ 1 ), for any x £ x N (k), let y = Ax + e for some (unknown) noise vector e. Define 

^ (M;= i±vgf;+^)_^ (30) 



and 



1-L(5,2p) 



e 3(1 + y/2) 

* (M - = l-L(5,2p) (31) 
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with L(S, •) and U(5, •) defined as in Theorem^ Let p*g(5) be the unique solution to p^{6~, p) = 1. 
For any e > 0, as (k,n,N) — > oo with n/N — >■ 5 G (0,1) and /c/n — >• p < (1 — e)po(S), there is an 
exponentially high probability on the draw of A that 

x : = arg min \\z\\^ subject to \\Az — 2/ 1 1 2 < IHI2 

z 

approximates x within the bound 

^(5,(1 + e)p) 

\\X-X\\ 2 <- g , , r r e 2 - (32) 

/?g(<5) is displayed as the blue curve in Figure [l|a). p £l (5,p) and £^(5, p)/(l — p £l (d,p)) are 
displayed in Figure [3] panels (c) and (d) respectively. 

Corollary 14 (Blanchard, Cartis, and Tanner |6j). Given a matrix A with entries drawn i.i.d. 
from A/"(0,n _1 ) ; for any x G X (&), let V = Ax. For any e > 0, with n/N — > 5 G (0,1) and 
k/n — > p < (1 — e)pg(6) as (k,n,N) — > oo, there is an exponentially high probability on the draw 
of A that 

x := arg min H^H^ subject to Az = y 

z 

exactly recovers x from y and A. 



4. Discussion and Conclusions 

Summary. We have presented a framework in which recoverability results for sparse approximation 
algorithms derived using the ubiquitous RIP can be easily compared. This phase transition frame- 
work, [161 [201 EH E] , translates the generic RIP-based conditions of Theorem [3] into specific sparsity 
levels k and problem sizes n and N for which the algorithm is guaranteed to satisfy the sufficient 
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RIP conditions with high probability on the draw of the measurement matrix; see Theorem [TJ 
Deriving (bounds on) the phase transitions requires bounds on the behaviour of the measurement 
matrix' RIP constants [1]. To achieve the most favorable quantitative bounds on the phase transi- 
tions, we used the less restrictive aRIP constants; moreover, we employed the best known bounds 
on aRIP constants, those provided for Gaussian matrices [6], see Theorem [9j 

This framework was illustrated on three exemplar greedy algorithms: CoSaMP [29J, SP |12j . 
and IHT [7J. The lower bounds on the phase transitions in Theorems [To|[l2| allow for a direct 
comparison of the current theoretical results/guarantees for these algorithms. 

Computational Cost of CoSaMP, SP and IHT. The major computational cost per iteration in 
these algorithms is the application of one or more pseudoinverses. SP uses two pseudoinverses 
of dimensions k x n per iteration and another to compute the output vector x; see Algorithm 2. 
CoSaMP uses only one pseudoinverse per iteration but of dimensions 2k x n; see Algorithm 1. 
Consequently, CoSaMP and SP have identical computational cost per iteration, of order kn 2 , if the 
pseudoinverse is solved using an exact QR factorization. IHT avoids computing a pseudoinverse 
altogether in internal iterations, but is aided by one pseudoinverse of dimensions k x n on the final 
support set. Thus IHT has a substantially lower computational cost per iteration than CoSaMP 
and SP. Note that pseudoinverses may be computed approximately by an iterative method such 
as conjugate gradients [29]. As such, the exact application of a pseudoinverse could be entirely 
avoided, improving the implementation costs of these algorithms, especially of CoSaMP and SP. 

Globally, all three algorithms converge linearly; in fact, they converge in a finite number of 
iterations provided there exists a /c-sparse solution to Ax = y and a sufficient aRIP condition is 
satisfied, see Corollary[2] For each algorithm, the upper bound on the required number of iterations 
grows unbounded as the function p alg (k,n, N) — > 1. Hence, according to the bounds presented 
here, to ensure rapid convergence, it is advantageous to have a matrix that satisfies a more strict 
condition, such as p al9 (k,n, N) < \. Similarly, the factor controlling stability to additive noise, 
namely the vector e in Theorem [IJ blows up as the function p al9 (k, n, N) — > 1. Again, according to 
the bounds presented here, in order to guarantee stability with small amplification of the additive 
noise, it is necessary to restrict the range of a i g {k, n, N). A phase transition function analogous 

to the functions p a g 9 {5) can be easily computed in these settings as well, resulting in curves lower 
than those presented in Figure [ija). This is the standard trade-off of compressed sensing, where one 
must determine the appropriate balance between computational efficiency, stability, and minimizing 
the number of measurements. 

Comparison of Phase Transitions and Constants of Proportionality. From Figure [ija) , we see that 
the best known lower bounds on the phase transitions for the three greedy algorithms satisfy the 
ordering p C g P (5) < p S g(S) < p^iS) for Gaussian measurement matrices. Therefore, we now know 
that, at least for Gaussian matrices, according to existing thoery, IHT has the largest region where 
recovery for all signals can be guaranteed; the regions with similar guarantees for SP and CoSaMP 
are considerably smaller. Moreover, IHT has a lower bound on its computational cost. 

The phase transition bounds p°g 9 {5) also allow a precise comparison of the recoverability results 
derived for these greedy algorithms with those proven for £i-regularization using the aRIP, see 
Figure [T] Although (29J [121 [7] have provided guarantees of successful sparse recovery analogous to 
those for £i-regularization, the greedy algorithms place a more restrictive aRIP condition on the 
suitable matrices to be used in the algorithm. However, some of the algorithms for solving the 



16 



£i-regularization problem, such as interior point methods, are, in general, computationally more 
expensive that the greedy methods discussed in this paper, and hence attention needs to be paid 
to the method of choice for solving the ^i-regularization problem [2] 125j . 

The lower bounds on the phase transitions presented here can also be read as lower bounds on the 
constant of proportionality in the oversampling rate, namely, taking n > (p°s 9 ($))~ l k measurements 
rather than the oracle rate of k measurements is sufficient if algorithm "alg" is used to recover the 
fe-sparse signal. From Figure [T|b), it is clear that according to the conditions presented here, the 
convergence of greedy algorithms can only be guaranteed with substantially more measurements 
than for ^i-regularization. The lowest possible number of measurements (when n = N so 5 = 1) 
for the algorithms are as follows: n > 907k for IHT, n > 3124/c for SP, and n > 4923/c for CoSaMP. 
On the other hand, an aRIP analysis of £i-regularization yields that linear programming requires 
n > 317k. In fact, using a geometric, convex polytopes approach, Donoho has shown that for l±- 
regularization, n > 5.9k is a sufficient number of measurements [6] 1161 IT8) when the target signal, 
x, is exactly fc-sparse, and the multiple 5.9 increases smoothly as noise is added [38]. 

Future Improvements and Conclusions. The above bounds on greedy algorithms' phase transitions 
could be improved by further refining the algorithms' theory, namely, deriving less strict aRIP 
conditions on the measurement matrix that still ensure convergence of the algorithm; as the latter 
is an active research topic, we expect such developments to take place. The phase transition 
framework presented here may also be applied to such advances. Alternatively, increasing the lower 
bounds on the phase transitions could be expected to occur from improving the upper bounds we 
employed on the aRIP constants of the Gaussian measurement matrices, see Theorem [9[ However, 
extensive empirical calculations of lower estimates of aRIP constants show the latter to be within 
a factor of 1.83 of our proven upper bounds [6]. During the revision of this manuscript, improved 
bounds on the aRIP constants for the Gaussian ensemble were derived pQ , tightening the bound to 
be within 1.57 of lower estimates. However, for p ~ 1CP 3 both bounds were already very sharp [I], 
and the resulting increase of the phase transitions shown here was under 0.5%. 



Appendix A. Proofs of Main Results 



We present a framework by which RIP-based convergence results of the form presented in 
Theorem [3] can be translated into results of the form of Theorem [TJ that is removing explicit 
dependencies on RIP constants in favour of their bounds. 

The proofs of Theorems [5| [6| and [7] rely heavily on a sequence of properties of the aRIP 
constants, which are summarize in Lemm a |15| and proven in Section [Appendix A.l Theorems 10 
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and 12 follow from Theorems [HJ [6j and |7[ and the form of fi al9 and ^ alg as functions of L and U: 



this latter point is summarized in Lemma 16 which is stated and proven in Section Appendix A.l 



The resulting Theorems 10, 11 and 12 can then be interpreted in the phase transition framework 
advocated by Donoho et al. [T6l HHJ US] [20], [22], as we have explained in Section [4j 

The remainder of the Appendix is organized by algorithms, with each subsection first proving 
convergence bounds for generic aRIP bounds, followed by the Gaussian specific variants as functions 
of (5,p). For the results pertaining to ^i-regularization, the reader is directed to [6]. 

Appendix A.l. Technical Lemmas 

Throughout the analysis of the algorithms, we repeatedly use implications of the aRIP on a 
matrix A as outlined in Lemma 15. This lemma has been proven in the symmetric case repeatedly 
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in the literature; we include the proof of the asymmetric variant for completeness. 

Recall that for some index sets /, J C {1, . . . , N}, the restriction of a vector x to the set I is 
denoted xj; i.e. = X{ for i £ I and = for i ^ I. Furthermore, the submatrix of A 

derived by selecting the columns of A indexed by I is denoted Aj. In either case, denoted the 
restriction of x to the set of indices in / that are not in J; likewise Aj_j is the submatrix formed 
by columms of A indexed by the set I — J. Finally, let M. 1 denote the set of vectors in M. N whose 
support is contained in I. 

Lemma 15 (Implications of aRIP). Let I and J be two disjoint index sets, namely I, J C 
{1, . . . , N}; I n J = 0. Suppose A is a matrix of size n x N with aRIP constants L(\I\ + | J\, n, N) 



and U(\I\ + \ J\,n, N), and let u G W , v £ R J , y G 



G (0,1), and Id the identity matrix of 



appropriate size. Then Definition [7] implies each of the following: 
(i) 
(ii) 

(Hi) 



A*y\\ 2 <^/l + U{\I\,n,N)\\y\\ 2 

l-L(|I|,n ) JV))||u|| 2 < ||^LHI 2 < (l + C/(|/|,n,iV))||u||2 
< (l-L(|7|,n,iV))-|||y|| 2 



A\y 



(iv) \(Aju,Ajv)\ <UL(\I\ + \J\,n,N) + U{\I\ + \J\,n,N))\\u\\ 2 \\v\ 



(v) P*^|| 2 <[/(|/| + |J|,n,iV)||t;|| 2 . 

(vi) \\{Id - ujA*jAj)u\\ 2 < max{w(l + U(\I\,n, N)) 



l,l-u(l-L(\I\,n,N))}\\v\ 



2- 



Proof. From Remark[2j it is clear that the aRIP constants are nondecreasing in the first argument 
pertaining the sparsity level. Therefore, A must also have the aRIP constants L(\I\,n, N) < 
L(\I\ + \ J\,n, N) and U(\I\,n,N) < U(\I\ + \ J\,n,N). Also, D efinition [I] impl ies that the singular 
values of the submatrix Aj are contained in the interval [y^l — L(\I\,n, N), + U(\I\,n, N)]. 
Thus, (i)-(iii) follow from the standard relationships between the singular values of Aj and the 
associated matrix in (i)-(iii). 

To prove (iv), let m = |I| + | J|. We may assume ||n|| 2 = ||u|| 2 = 1; otherwise we normalize the 
vectors. Let a = Aju and (5 = Ajv. Then, since I D J = 0, 



±v 



\\A lU ±Ajv\\ 



[Ai,Aj 



u 

±v 



+ v 



(A.l) 



(A.2) 



[Aj, Aj] is a submatrix of A of size n x m, so applying Definition [T] to the right most portion of 
.2), we have 

2(l-L(m,n,A0) < ||a±/3|| 2 < 2 (1 + U(m, n, N)) . (A.3) 



(A.l) and invoking (A.2), we have 



By polarization and (A.3), 



and 



(a, (3) 



\a + 



a 



la < L(m,n,N) + U(m,n,N) 



a 



\a + 



|a < L(m,n,N) + U(m,n,N) 
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Thus | (Aru,Ajv) | = | (a,/3) | < (L(m,n,N) + U(m,n,N)) /2, establishing (iv). 

Since In J = 0, the matrix —A*jAj is a submatrix of Id/uj — AjAj. To establish (v), we observe 
that the aRIP implies that the eigenvalues of every size n x m submatrix of A lie in the interval 
[1 — L(m, n, N), 1 + U(m, n, iV)]. Thus the eigenvalues of Idiuj — A*jAj must lie in [0, U (m, n, iV)]. 
Therefore H-Aj-AjvHg = ||— A^jt!^ completes the proof of (v). 

To prove (vi), note that \\(Id — uA*jAj)\\2 is bounded above by the maximum magnitude of the 
eigenvalues of (uAjAj — Id), which lie in the interval with endpoints uj(1 — L(\I\,n, N)) — 1 and 
u(l + U(\I\,n,N)y- 1. ■ 

Theorems 10, 11 and 12 follow from Theorems [5J [6j and [7] and the form of p ala and £ al9 as 



functions of L and U. We formalize the relevant functional dependencies in the next three lemmas. 

Lemma 16. For some r < 1, define the set Z := (0, t) p x (0,oo) q and let F : Z — )• M be con- 
tinuously differentiable on Z. Let A be a Gaussian matrix of size n x N with aRIP constants 
L(-,n, N),U(-,n, N) and let L(5, -),U(S, •) be defined as in Theorem^ Define 1 to be the vector of 
all ones, and 

z(k, n, N) := [L(k, n,N),..., L(pk, n, N), U(k, n,N),..., U{qk, n, N)] (A.4) 
z(5,p) := [L(6,p),...,L(6,pp),U(6,p),...,U(5,qp)]. (A.5) 



(i) Suppose, for all t G Z, (VF[t]) i > for all i = 1, . . . ,p + q and for any v G Z we have 
VF[t] ■ v > 0. Then for any ce > 0, as (k,n,N) — > 00 with •& — > 5, ~ — > p, there is an 
exponentially high probability on the draw of the matrix A that 

Prob (F[z(k, n, N)] < F[z(5, p) + Ice]) — )• 1 as n -> 00. (A. 6) 

(ii) Suppose, for all t G Z, (Vi ? [t]) i > for all i = 1, . . . ,p + q and there exists j G {1, . . . ,p} 
such that (VF[t])j > 0. Then there exists c G (0,1) depending only on F,5,and p such that 
for any e G (0, 1) 

F[z(8,p) + lce]<F[z(5,(l + e)p)}, (A.7) 
and so there is an exponentially high probability on the draw of A that 

Prob(F[z(k,n,N)} < F[z(5,{l + e)p)}) ->• 1 as n -> 00. (A. 8) 

Also, F(z(5,p)) is strictly increasing in p. 

Proof. To prove (i), suppose u,v G Z with vi > U{ for alH = 1, . . . ,p+q. From Taylor's Theorem, 
F[v] = F[u + (v - u)] = F[u] + VF[t] ■ [v - u] with t = u + X[v - u] for some A G (0, 1). Then 

F[v] > F[u] (A.9) 

since, by assumption, VF[t] • [v — u] > 0. 

From Theorem[9j for any ce > and any i = 1, . . . ,p+q, as (k, n, N) — > 00 with — > S, ^ — > p, 

Prob (z(k, n, N)i < z(5, p)i + ce) -4 1, 

with convergence to 1 exponential in n. Therefore, letting Vi := z(S,p)i + ce and Ui := z(k,n, N)i, 
for alH = 1, . . . ,p + q, we conclude from (A.9) that 

Pvob(F[z(k, n, N)] < F[z(S, p) + Ice]) -4 1, 
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again with convergence to 1 exponential in n. 

To establish (ii), we take the Taylor expansion of F centered at z(5,p), namely 

F[z(5, p) + Ice] = F[z(6, p)] + VF[t{\ ■ Ice for t x G {z{5, p), z(S, p) + Ice) 



(A.10) 



F[z(5, (1 + e)p)] = F[z(6, p)) + ( VF[z(S, p)\ ■ ^z(S, p) 



ep for ti G (p, (1 + e)p). (A.ll) 



P=*2 



Select 



t\ = argmax{VF[ti] : t x G [z(S, p), z(5, p) + 1]} 



1 2 - argmiu <^ ( VF[z(S,p)] ■ ^(5, p) 



: t 2 G [p, (1 + e)p] 



P=t2 



so that 



F[z(<5, p) + Ice] < F[z(<5, p)] + VF[%] ■ Ice 
F[z(5, (1 + e)p)] > F[z(<5, p)) + (vF[z(S, p)] • p) 



ep. 



P=*2 



(A.12) 
(A.13) 



Since L(5,p) is strictly increasing in p {Bj, then ( J^z(5,p) 



> for all j = 1, ... , p. Since 



£/ (5, p) is nondecreasing in p [6j , then ( 2(5, p) 



the hypotheses of (ii), 



P =t 



> for all i = p + 1, . . . ,p + q. Hence, by 



2/ i 



VF[z(S,p)]~z(S,p) 



> 

P =q 
VF[t{] • 1 > 0. 



Therefore, for any c satisfying 



< c < min < 



1,P- 



P =i 2 



VF[i*] • 1 



(A.12) and (A.13) imply (A.7). Since the hypotheses of (ii) imply those of (i), (A.6) also holds, 
and so (A. 8) follows. F(z(6,p)) strictly increasing follows from the hypotheses of (ii) and L(5,p) 
and U(5,p) strictly increasing and nondecreasing in p, respectively [BJ. ■ 

Let the superscript alg denote the algorithm identifier so that p al9 (k,n, N) is defined by one 
of go), (13)), (D}, while p al9 (5,p) is defined by one of @, @. Next, a simple property is 
summarized in Lemma 17, that further reveals some necessary ingredients of our analysis. 

Lemma 17. Assume that p al9 (5,p) is strictly increasing in p and let p^ 9 (S) solve p al9 (5,p) = 1. 
For any e G (0, 1), if p < (1 - e)p a ^(5), then p al9 {5, (1 + e)p) < I. 
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PROOF. Let pf 9 {5) be the solution to p ala {8, (1 + e)p) = 1. Since by definition, Pg 9 {5) denotes a 
solution to p al9 (S, p) = 1, and this solution is unique as p al9 (8, p) is strictly increasing, we must have 
(1 + e)pf 9 {5) = p a s 9 {5). Since (1 - e) < (1 + e)" 1 for all e G (0, 1), we have (1 - e)p a s l9 (5) < pf 9 {5). 
If p < (1 — e)p°i 9 (S), then since p alg (5,p) is strictly increasing in p, 

p al9 (5, (1 + e)p) < p alg (5, (1 + e)(l - e)pf («5)) < M ^(<5, (1 + e)^»(*)) = 1. 



Note that Lemma [lojii) with F := p alg will be employed to show the first assumption in Lemma 
17 this is but one of several good uses of Lemma [TB] that we will make. 

Corollaries [2] and [4] are easily derived from Lemma [TH} Note that this lemma demonstrates only 



that the support set has been recovered. The proof of Lemma 18 is a minor generalization of a 
proof from |12| Theorem 7]. 

Lemma 18. Suppose, after I iterations, algorithm alg returns the k-sparse approximation x l to a 
k-sparse target signal x. Suppose there exist constants p and k independent of I and x such that 



\x — x l \\2 < K/x ; ||a;||2. 



(A.14) 



If p < 1, then the support set of x l coincides with the support set of x after at most £max{x) 
iterations, where 

log \l> m in{- 



t al9 (x) - 



klg/U 



+ 1, 



(A.15) 



where v m i n {x) is defined in 



PROOF. Let T be the support set of x and T l be the support set of x l ; as x, x l G ^ (k\ \T\, \T l \ < k. 



From the definition (A.15) of £max(x) and ([5]), ^p^max.^) ||x||2 < minjg^ From (A.14), we then 



have 



which clearly imp 
Theorems [HJ [6 
proving Corollary 



-t lg <x)i 



./• - .i '||2 < up 

f al 3 („.\ n- Iml \ml> al 9 



£ a ' 9 <x)ii II "II 

ier 
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for 



ies that T C T*™"^. Since \T\ = \T^ X ^\, the sets must be equal. ■ 
and [7] define the constants p = p al9 (k,n, N) and k to be used in Lemma 
4} For CoSaMP and IHT, k = K al s(k, n, N) = 1. For SP, the term involving - k is 

(A.16) 



removed by combining Lemmas 23 and 24 (with e = 0) to obtain 

II^T-T'lb < H sp (k,n, N)\\x T _ T i-i\\ 2 



applying (A.16) iteratively provides 



\x T _ T ih < p sp (k,n,N) l \\x\ 



(A.17) 



which again, gives k = 1. Similarly, Theorems 10J 11 and 12 define the constants p = p 9 (5, p) and 
k to be used in Lemma 18 for proving Corollary [2j with the above comments on the IHT choice of 
k also applying in this case. 

To ensure exact recovery of the target signal, namely, to complete the proof of Corollaries [2] and 
|4j we actually need something stronger than recovering the support set as implied by Lemma [i~8) 
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For CoSaMP and SP, since the algorithms employ a pseudoinverse at an appropriate step, the 
output is then the exact sparse signal. For IHT, no pseudoinverse has been applied; thus, to 
recover the signal exactly, one simply determines T from the output vector and then x = Aj,y. 
These comments and Lemma 18 now establish Corollaries [2] and [4] for each algorithm, and we will 



not restate the proof for each individual algorithm. 

In each of the following subsections, we first consider the case of general measurement matrices, 
A, and prove the results from Section [2] which establish an aRIP condition for an algorithm. We 
then proceed to choose a specific matrix ensemble, matrices with Gaussian i.i.d. entries, for which 
Section [3] establishes lower bounds on the phase transition for exact recovery of all x G X N (k) and 
then provide probabilistic bounds on the multiplicative stability factors. 

Appendix A . 2. Proofs for CoSaMP 

In this section we prove the results from Sections [2] and [3] reported for CoSaMP [29]. The proofs 
mimic those of Needell and Tropp while employing the aRIP constants. In each proof, the smallest 
possible support is retained for the aRIP constants in order to acquire from this method of analysis 
the best possible conditions on the measurement matrix used in the CoSaMP algorithm. This 
change is in many cases straightforward, requiring only a substitution of U(ak, n, N) or L(ak, re, N) 
for R(ak,n, N), for some a £ {2,3,4}. In such cases we simply restate the result. Where there is 
a more substantial change, we provide fuller details of the proof. 

The argument proceeds in [29] by establishing bounds on the approximation error at a given 
iteration in terms of the approximation error at the previous iteration, and the energy of the noise. 
Since each iteration of the CoSaMP algorithm consists of essentially four steps, this was achieved 
by a series of four lemmas |29} Lemmas 4.2 to 4.5], one for each step. We restate [29, Lemmas 4.3 
and 4.5] (the support merger and pruning steps respectively) without any alteration, and provide 
an outline of how the proofs of [29j Lemmas 4.2 and 4.4] (identification and estimation) can be 
adapted. To simplify the working, we follow [29] and introduce some further notation: let the set 
of 2k indices corresponding to the largest magnitude entries of A*y l ~ l in Step 1 of Algorithm 1 be 
denoted by 0. Also let r = x T i-i — x be the error in the approximation from the previous iteration, 
and let R be the support set of r, so that \R\ < 2k. 



Lemma 19. After the identification step, we have 

( L(2k, n, N) + U(2k, n, N) + L(4k, n, N) + U(4k, n, N) \ J\ + U(2k, re, N) 

l|rnC " 2 - V 2(l-L(2k,n,N)) J l|r||2+ 1 — L(2k, re, N) ™' 



Proof. By Lemma 15 we have 

\\y\n-R)h< \(L{Ak, re, N) + U(4k, re, N)) \\r\\ 2 + ^1 + U(2k, re, N)\\e\\ 2 , and 

\\y\ R -n)h > (l~L(2k, re, N))\\r (R _ n) \\ 2 -~(L(2k, re, N)+U(2k, re, N))\\r\\ 2 - y/l + U(2k, re, N) [|e|| 2 . 
The result now follows by rearrangement. ■ 
Lemma 20. After the support merger step, we have 
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Lemma 21. After the estimation step, we have 

\\x - x\\ 2 < ^1 + 2 (l-L(3k,n,N)) 
Proof. Using Lemma [T5| we have 



\ e \\2- 



L(Ak, n, N) + U(Ak, n, N) 

\Xrpl — Xrpl\\2 < 7" CqJ. ~ j\7V\ ipfT') ll 2 



1 



e 2, 



2(l-L(3fc,n,JV)) " {1 j " v^Lp^iV) 
which combines with [|x — £{{2 < ||aVfnc||2 + ||a^fi — ^f-ilb to give the required result. 
Lemma 22. After the pruning step, we have 

\\x — x T i H2 < 2\\x — 5; 1 1 2 ■ 
The preceding lemmas facilitate the proof of Theorem [5j 



Proof (Theorem tal). By Lemmas 21 and 22, we have 



\x — x T i || 2 < 2\\x — x\\2 

L{Ak,n,N)+U(ik,n,N) 
l-L(3k,n,N) 



X 



(T») 



c 



2 + 



% /l-L(3fc,n,Af) 



l e l|2- 



Now by Lemma 
fying, we obtain 
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< [2 + 

X/fi^c^ is bounded above by ||t"qc||2- Then applying Lemma 19 and simpli- 



\x T i -x\\ 2 < n csp (k,n,N)\\x T i-i -x\\ 2 + e sp (k,n,N)\\e\ 



(A.18) 



Given our assumption that fj, csp (k, n, N) < 1, we can now prove a stronger statement, namely that 
for I > we have 



\x T i - x\\ 2 < [v csp (k, n, N)] 1 \\x\\ 2 + e c sp (k, n, N) 



e 2- 



(A.19) 



1 - [n cs P(k,n,N)} 1 
1 - i^ cs P(k, n, N) 

We proceed by induction. Assume the result holds for some I > 0. Then, applying the inductive 
hypothesis and (A.18), we have 

\\x T i+i-x\\ < fi csp (k,n,N)\\x T i -x\\ 2 +£ csp (A:, n,iV) ||e|| 2 

< fi csp (k, n, N) ([fi csp (k, n, N)} 1 \\x\\ 2 + ^ p (k, n, N) [|e[| 2 ) + £, csp {K n, N)\\e\ 

= [^ p (k, n, N)] l+1 \\x\\ 2 + e csp (fe, n, N) ^ p (k, n, N) ^Z^'fff + l) ||e|| 2 



[^ p (k, n, N)} 1+1 \\x\\ 2 + ^ p (k, n, N) ( ^g^gg^ ) 



e 2, 



and so the result is also true for I + 1, and so (A.19) holds for all Z > by induction. 
Finally, note that 



1 - [fi csp (k,n,N)] l \ ^ £ csp (k,n,N) 
1 - n cs P{k,n,N) 



< 



1 - /j, cs P(k,n,N) 



for all I > 0, and also that if CoSaMP terminates after I iterations we have x = x T i . ■ 

Having established the results of Section [2] for CoSaMP, we now focus on Gaussian random 
matrices and prove the results from Section [3] concerning CoSAMP. 
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Proof (Theorem 
Fix t < 1 and let 



10). Let x,y,A and e satisfy the hypothesis of Theorem 



10 



and select e > 0. 



z(k, n, N) = [L(2k, n, N),L(3k, n, N),L(4k, n, N), U(2k, n, N), U(4k, n, N)] 
and z(S,p) = lL(6,2p),L(5,3p),L(5,4p),U(6,2p),U(5,4p)}. 

Define Z = (0, r) 3 x (0, oo) 2 and define the functions F cs p, G csp : Z -»• R: 



F cs P[z] :=jF c SP[zi) ...^ 5 ] = 2 2 + 



Z3 + Z 5 \ ( Zx + Zi + Z 3 + Z 5 



1-Z2 



1-zx 



G cs P[z] ._ G ^>[z 1 ,... ) z s \ = 2 2 + 



1-Z 2 



+ 



1 



(A.20) 
(A.21) 



Clearly, (VF^^), > for all i = 1, . . . , 5 and 

h + h 



(VF 



csp r 



2 + 



l + U + t 3 + t 5 
(1 -*i) 2 



> 0. 



Hence the hypotheses of Lemma|l6| (ii) are satisfied for F csp . By (10), (23) and (A.20), F csp [z(k,n, N)] 
p csp (k,n,N) and F csp [z(5, p)] = fi csp (S,p). Thus, by Lemma p6p as (k,n,N) -»• oo with ft -»• 5, 

n rl 

Prob (// sp (fc, n, iV) < ^ csp (5, (1 + e)p)) -»■ 1. (A.22) 

Also, p csp (5,p) is strictly increasing in p and so Lemma [17] applies. 

Similarly, G csp satisfies the hypotheses of Lemma |16| (ii). Likewise, by (11), (24) and (A.21), 
G csp [z(k,n,N)] = £ csp (k,n,N) and G csp [z(5,p)} = £ csp {5,p). Again, by Lemma Il6l as (k, n, N) -> 



oo with 



N 



6, 



P, 



Prob (e sp (k, n, iV) < £ csp (<5, (1 + e)p)) -> 1. 



(A.23) 



Therefore, for any x G x (fe) and any noise vector e, as (k,n,N) —> oo with ^ — >• 5, - — >• p, 
there is an exponentially high probability on the draw of a matrix A with Gaussian i.i.d. entries 
that 

r csvn aaiJh n £ csp (k,n,N) .... ^ rcsc/Wl s ^ „ M £ csp (<5, (1 + e)p) , M 

^ csp (£;, n, AT)] ||s|| 2 + v Ar J H| 2 < U + e )p)\ \\ x h + „ . s IMb- 



1 - fi csp (k,n,N) 
Combining (A.24) with Theorem [5] completes the argument. 



1 - p csp (8, (1 + e)p) 



(A.24) 



Appendix A. 3. Proofs for Subspace Pursuit 

In this section we outline the proofs for the results in Section [2] and then prove the results in 
Section [5] reported for SP [12]. The proofs mimic those of Dai and Milenkovic while employing the 
aRIP constants. In each proof, the smallest possible support is retained for the aRIP constants 
in order to acquire from this method of analysis the best possible conditions on the measurement 
matrix used in the SP algorithm. 

The index set T defines the support of the target signal x; T = supp(x). For this section, the 
index sets T l ,T l ,T lzizl and the vectors defined by SP, Algorithm^! 
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We begin in the setting of an arbitrary measurement matrix A of size n x N and formulate 
the aRIP conditions of Theorem [6j A sequence of lemmas leads us to Theorem [6} Lemmas 23 



and 24 directly follow the proofs from [12, Theorem 10] with the adaptation that we employ 
the aRIP constants from Definition [TJ Lemma 15 and we maintain the smallest support size in 
L(-,n,N),U(-,n,N). 



Lemma 23. For x G x (&) an d V = Ax + e, after iteration I of SP 



< 



2U(3k,n,N) 



t-tMI 2 - i-L{k,n,N) V l-L(k,n,N) 



1 + 



U(2k,n,N) 



\ X t-t>-4,+ l — L(k, n, N) l|e " 2 - (A - 25) 



Lemma 24. For x G x (&) an <^ 2/ = Ax + e, a/ter iteration I of SP 

/ 2[/(3fc,n,AQ \ 

||x T _ Ti || 2 < ^ 1 + 1 _ L(2fcjrl)iV) J ll*T-T«ll 2 + 



^1 - L(2k,n,N) 



\ e \\2- 



(A.26) 



The following lemma is an adaptation of |12[ Lemma 3] . By using Definition [T] and selecting 
the smallest possible support sizes for the aRIP constants, we arrive at Lemma [25} 



Lemma 25. Let x G x (&) anc ^ 2/ = ^ x + e 6e i/ie measurement contaminated with noise e. If the 
Subspace Pursuit algorithm terminates after I iterations, the output x approximates x within the 
bounds 



\x - &H2 < 1 + 



U(2k,n,N) 
1 - L(k,n,N) 

Lemmas [23[]25] combine to prove Theorem [6j 



1 I, , y/l + U(k,n,N) 

\xT-i*h+ i^k^W) INI*- 



(A.27) 



Proof (Theorem [6]). After applying Lemma [23] to Lemma 24 we bound the entries of x that 
have not been captured by Algorithm [2j namely 



where 



cj) sp {k,n,N) 



\x T -tA\2 < H sp (k,n,N) \\x T _ T i-i\\ + <p sp (k,n, N)\\e\ 



2\A + U(k, n, N) f 2U(3k,n,N) \ 

l-L(k,n,N) V 1 - L(2k, n, N) J J\ 



a/1 - L(2k, n, N) ' 



(A.28) 



(A.29) 



Applying (A.28) iteratively, we develop a bound in terms of the norm of x, by observing that 
II aJT-r" II 2 < 1Mb: 



I 11 <r r spa, jvm'il 11 <t> sp (k,n,N) „ 



(A.30) 



4> a P(k,n,N) 
l -u s P(k.n.N) 



The factor 



in Appendix Appendix A, 2 



amplifying [| e [| 2 in (A.30) is found by induction as in the proof of Theorem^ 



From Lemma 



25 



with K sp (k, n, N) = 1 + we have 

|s - x[| a < n, AQH^ ,| 2 + V _ ^ " > \\e\\ 2 . 



(A.31) 



25 



Applying (fOob to dA.311), 



|as— £c|| 2 < K sp (k, n, N) [p sp (k, n, N)] 1 \\x\\ 2 + n sp (k, n, N) 



(/) s P(k,n,N) , y/l + U(k,n,N) 



+ 



1- fj, s P(k,n,N) l-L(k,n,N) 



\ e \\2- 



From (14), we verify that 

£ sp (k,n,N) 



K s v(u „ m ^ sp (k,n,N) y/1 + U(k,n,N) 
ll ' J l-^(M,iV) l-L(k,n,N) 



(A.32) 
(A.33) 



1 - n s P(k,n,N) 

which completes the proof. ■ 

Having established the aRIP conditions for an arbitrary measurement matrix, we again return to 
the Gaussian random matrix ensemble and establish the quantitative bounds for SP from Section |3j 



Proof (Theorem 11). Let x, y, A, and e satisfy the hypothesis of Theorem 11 and select e > 0. 
Fix t < 1 and let 

z(k, n, N) = [L(k, n, N), L(2k, n, N), U(k, n, N), U(2k, n, N), U(3k, n, N)] 
and z(5,p) = [L(6,p),L(5,2p),U(5,p),U(5,2p),U(5,3p)}. 
Define Z = (0, r) 2 x (0, oo) 3 and define the following functions mapping Z — > M: 



F sp [z] : 


= F sp [ Zl ,. 


-,z 5 


K[z) 


:=K[ Zl ,. 


-,z 5 


G sp [z) : 


= G sp [ Zl ,. 


■ ,zs 


H[z] 


:=H[z lf . 


-,Z6 



2.5 



] = 1 + 



1-21 

Z.i 



1 + 



2z 5 

1-Z2 



l-zi 



2^±^(l + 225 



1-Z2 



1 + 



+ 



Z\ 



(A.34) 
(A.35) 

(A.36) 

(A.37) 



1-Z! 
l-Z! ' 

For each of these functions, the gradient is clearly nonnegative componentwise on Z, with the first 
entry of each gradient strictly positive which is sufficient to verify the hypotheses of Lemma 16 (ii). 



Moreover, from ([12j)™([14j) and (|25j)-([27j), we have 

K sp (k, n, N)p sp (k, n, N) = K[z(k, n, N)]F sp [z(k, n, N)}, 

K sp (5,p)p sp (5,p) = K[z(5, P )]F sp [z(5,p)i 

£ sp (k,n,N) v . ArX1 G sp [z(k,n,N)} 

- K[z(k,n,N)]- i '- " +H[z(k,n,N)}, 



1 - p sp (k,n,N) 
£ sp (5,p) 



1 - F s P[z(k, n, N)] 



1 - p sp {5, p) 1 F s p[z(5, p)] 

Invoking Lemma 16 for each of the functions in (A.34 )— ( A.37| ) yields that with high probability on 
the draw of A from a Gaussian distribution, 

K sp (k, n, N) [p sp {k, n, N)] 1 \\x\\ 2 < K sp {5, (1 + e)p) [p sp (5, (1 + e)p)] 1 \\x\\ 2 , (A.38) 

1 - p sp (k, n, N) 11112 1 - fi»P(S, (1 + e)p) ™' ( ' 

Combining (A.38) and (A. 39) with Theorem [6] completes the argument, recalling that Lemma 16 
applied to F sp = p sp also implies that p sp (5, p) is strictly increasing in p and so Lemma 17 holds. ■ 
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Appendix A. 4- Proofs for Iterative Hard Thresholding 

In this section we first outline a proof of Theorem [7j which follows similar lines to that given by 
Blumensath and Davies in Corollary 4], while considering a generalization to aRIP bounds, and 
also incorporating a stepsize u. Having established this result for arbitrary measurement matrices, 
we then go on to prove Theorem 12 which gives conditions for high-probability convergence of IHT 
in the specific case of Gaussian random matrices. 

PROOF (Theorem [7]) . Let B l = T l U supp(x). Since \B l \ < 2k < 3k, we can deduce from 
Lemma [15] that 



J-i 



(A.40) 



— u)A B iA B i)(x~ l _ 1 — x) B i\\2 < <p (3k,n, N^Kxj,^ — x) B i\\ 2 , 
where <p lht (3k, n, N) is defined to be 

<p iht {3k,n,N) = max{w [1 + U (3k, n, N)] - 1,1 L(3k,n,N)]} . 

Furthermore, we have 

(u)A* Bl A( B i-i_ B i)) C (u}A* BluB i^A*^ B i uB i^ - I). 

Since the eigenvalues of a submatrix are bounded in magnitude by the eigenvalues of the entire 
matrix, and since IS'UB'- 1 ! < 3k, we can again invoke Lemma 15 to obtain 



||wA^A (B «_i_ B i ) (s , rI i 1 -x) (b j_i_ bI) || 2 < <p lht {3k,n,N)\\{x l Tl l _ 1 - x) {Bl -,_ Bl) \\ 2 . (A.41) 
Now we have from the proof of [2, Corollary 4] that 

||xyi-x||2 < 2||(7— ajA^i A_ B i)(a;^l 1 — x)^; || 2 +2||u;A^z A^i-^^i) (x^^ — a;)( B i_i_ jB i) || 2 +2||a;A^ie||2. 

(A.42) 



Substituting (A.40) and (A.41) into (A.42), and applying Lemma 15 to the error term, we obtain 
llac^i— sella < 2cj) ih \k,n,N) (jK^-! - x) B i\\ 2 + - x) (B i-i_ B i ) \\ 2 )+2u^l + U{2k, n, N)\\e\\ 2 . 



Now B and (B — B ) are disjoint, so we have 



J-i 



from which it now follows that 



x )B'h + \\(x l T i-i ~ x)(B'-i-Bi)h < V^IK^lj -x) B i u(B ;-i_ B i)||2, 



\x l Tl - x\\ 2 < ^ ih \k,n,N)\\x l ~\ 1 - x\\ 2 + e h \k,n,N)\\ 



e 2, 



with n lht (k,n, N) and ^ lht {k,n,N) defined in (15) and (16), respectively. Given our assumption 
that fi lht (k, n, N) < 1, an induction argument analogous to the induction in the proof of Theorem [H] 
gives the stronger result 



X\\2 1 



H iht {k,n,N) ||x|| 2 + f ftt (£:,n,A0 



iht i 



1- [ii iht {k,n,N)] V 
1 - /j iht (k,n,N) 

r l 

br pi ) 



e 2- 



We finally note that if IHT terminates after I iterations we have from which the results 

now follows. ■ 

Armed with the results of Section [2] for IHT, we return to the family of Gaussian random 
matrices and prove the quantitative bounds for IHT from Section [3| 
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Proof (Theorem 12). Let x,y,A and e satisfy the hypothesis of Theorem 12 and select e > 0. 
Fix t < 1 and let 

z(k, n, N) = [L(3k, n, N), U(2k, n, N), U(3k, n, N)] 
and z(5,p) = [L(8,3p),U(5,2p),U(S,3p)). 

Define Z = (0, r) x (0,oo) 2 . For an arbitrary weight lo G (0,1), define the functions F^^G 1 ^ 1, : 
Z -> M: 



F l u ht [z\ := F^ ht [z 1 ,z 2 , z 3 ] = 2V2max{a;[l + z 3 ] - 1, 1 - u[l - Zi]} 
r iht \-A — r iht \-y v v l - 03 ( ^ 1 + Z2 

^u) l z \ ■— ^oj [ z li z 2, z 3\ 



V2\ 



1 — max {u;[l -+- z 3 ] — 1, 1 — w[l — zi]} 



(A.43) 
(A.44) 



[Note that F^ ht [z(k,n, N)] = p iht (k, n, N) and G^ lt [z(k, n, N)] = f ht (k, n, JV)/(1 - p> iht (k, n, N)) 
due to ( p~5] ) and ([161).] Clearly the functions are nondecreasing so that, with any t G Z, (Vi ? *[t]) i > 
and (VGf ll [f]) i > for i = 1, 2, 3; note that F^ft] and G^'M have points of nondifferentiability, 
but that the left and right derivatives at those points remain nonnegative. Also, and for any v G Z, 
since U,Vi > for each i, \7F^ ht [t] • v > and VGjy*[t] • u > as both functions clearly increase 
when each component of the argument increases. Hence, F^ ht and G 1 ^ 1 satisfy the hypotheses of 
Lemma 



16 



(i). Therefore, for any u G (0, 1), as (k, n, N) — > oo with — > 5, - 



Now fix uj* 



Prob [Fi ht [z{k,n,N)\ < F^ ht [z(5,p) + Ice] 
Prob (Gl ht [z(k,n,N)} < G%*[z(5,p) + Ice] 

and define 

z\ + z 3 



2+U{S,3p)-L(S,3p) 



F^[z] :=F^[z 1 ,z 2 ,z 3 
&$[z] :=G^[z l7 z 2 ,z 3 



2^2 



2 + z 3 - z 1 
4^1 + z 2 



2- (2x/2-l)^ 3 - (2V2 + l)jei' 



(A.45) 
(A.46) 

(A.47) 
(A.48) 



Then for any i G £, (v^l*]) > for i = 1,3 and (v^'[t]) = 0. Likewise, (vG^[tj) > 



for i = 1, 2, 3. Thus i^* and G^l satisfy the hypotheses of Lemma 16 (ii) and, therefore, 

F$ t [z(S,p) + lce}<F$ t [z(6,(l + e)p)), 
Gl%(6, p) + Ice] < Gl h ![z(5, (1 + e)p)}. 



Finally, observe that 



F^[z(5,p) + lce}=F^[z(8,p) + lce] 
C$[z(6,p) + Ice] = G^[z{5,p) + Ice] 



(A.49) 
(A.50) 

(A.51) 
(A.52) 



In (A.45) and (A.46), the weight was arbitrary; thus both statements certainly hold for the partic- 
ular weight u*. Therefore, combining (A.45), (A.49), (A.51) and combining (A.46), (A.50), (A.52) 
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imply that with exponentially high probability on the draw of A, 

F^[z(k,n,N)]<F^[z(S,(l + e)p)}, (A.53) 

G^[z{k,n,N)] < G^[z(6,(l + e)p)]. (A.54) 

Therefore, with the weight oj* , there is an exponentially high probability on the draw of A from a 
Gaussian distribution that 

/"(*,n, N) = F$>[z(k, n, N)\ < ¥$t[z(6, (1 + e)p)] = v""(6, (1 + e)p), (A.55) 

-^l<«4(U»)l-.^!. > , (A.56) 



1- p iht {k,n,N) W * L v ' ' ;J w 1 v " ;K7J l-/z< w (5,(l + e)p)' 



where we also employed (15), (16) with a; = cj*, and (28), (29). The result follows by invoking 
Theorem [7l and applying (A.55) and (A.56); recall also that Lemma 17 holds since p lht (5,p) = 



F^i t (z(6, p)) is implied to be strictly increasing in p by Lemma 16 (ii). ■ 
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